gut = read.csv("gut_diversity.csv", header=TRUE)

gut$age = scale(gut$age)
gut$exercise = scale(gut$exercise)
gut$weight = scale(gut$weight)

gut$subject = as.factor(gut$subject)
summary(gut)
##     subject     diversity            age.V1            exercise.V1     
##  1      : 1   Min.   :0.420   Min.   :-1.5018075   Min.   :-1.2373974  
##  2      : 1   1st Qu.:2.777   1st Qu.:-0.8431200   1st Qu.:-0.8540849  
##  3      : 1   Median :4.255   Median :-0.1844325   Median :-0.0928588  
##  4      : 1   Mean   :4.688   Mean   : 0.0000000   Mean   : 0.0000000  
##  5      : 1   3rd Qu.:5.992   3rd Qu.: 0.7706644   3rd Qu.: 0.4092265  
##  6      : 1   Max.   :9.460   Max.   : 1.9233674   Max.   : 2.0450529  
##  (Other):14                                                            
##       weight.V1      
##  Min.   :-1.3732613  
##  1st Qu.:-0.8364117  
##  Median :-0.1599812  
##  Mean   : 0.0000000  
##  3rd Qu.: 0.6936097  
##  Max.   : 1.9122584  
## 

In Class Work

Ancova is anova plus regression analysis.

Fit a linear model using lm(), with both IVs but no interaction. Look at the intercept and slopes.

model.both_iv = lm(diversity ~ age + exercise, data=gut)
summary(model.both_iv)
## 
## Call:
## lm(formula = diversity ~ age + exercise, data = gut)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.14565 -1.18318  0.02118  0.93561  2.62092 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.68750    0.32238  14.540 5.07e-11 ***
## age          0.04636    0.33732   0.137    0.892    
## exercise     2.39877    0.33732   7.111 1.75e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.442 on 17 degrees of freedom
## Multiple R-squared:  0.7572, Adjusted R-squared:  0.7286 
## F-statistic: 26.51 on 2 and 17 DF,  p-value: 5.951e-06

Fit a separate simple linear regression model using lm() for each IV. Compare the intercept and slopes to the multiple regression model.

model.age_only = lm(diversity ~ age, data=gut)
model.exercise_only = lm(diversity ~ exercise, data=gut)

summary(model.age_only)
## 
## Call:
## lm(formula = diversity ~ age, data = gut)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.4892 -2.0048 -0.5452  1.3581  5.1851 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.6875     0.6246   7.505 6.01e-07 ***
## age           0.5175     0.6408   0.807     0.43    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.793 on 18 degrees of freedom
## Multiple R-squared:  0.03496,    Adjusted R-squared:  -0.01866 
## F-statistic: 0.652 on 1 and 18 DF,  p-value: 0.4299
summary(model.exercise_only)
## 
## Call:
## lm(formula = diversity ~ exercise, data = gut)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.11174 -1.16199  0.00008  0.89369  2.60417 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.6875     0.3135  14.954 1.36e-11 ***
## exercise      2.4079     0.3216   7.487 6.22e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.402 on 18 degrees of freedom
## Multiple R-squared:  0.7569, Adjusted R-squared:  0.7434 
## F-statistic: 56.05 on 1 and 18 DF,  p-value: 6.217e-07

Intercept is same for both. Slope changes.

Simple linear regression of diversity (DV) on weight (IV). Look at significance, either with summary() or anova()

model.weight_only = lm(diversity ~ weight, data=gut)
summary(model.weight_only)
## 
## Call:
## lm(formula = diversity ~ weight, data = gut)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2086 -0.9025 -0.0632  0.6016  3.2008 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.6875     0.4094  11.449 1.07e-09 ***
## weight       -2.1175     0.4200  -5.041 8.49e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.831 on 18 degrees of freedom
## Multiple R-squared:  0.5854, Adjusted R-squared:  0.5623 
## F-statistic: 25.41 on 1 and 18 DF,  p-value: 8.494e-05

Multiple linear regression of diversity (DV) on exercise (IV1) and weight (IV2), no interaction. Look at significance of weight now.

model.ex_wt = lm(diversity ~ weight + exercise, data = gut)
summary(model.ex_wt)
## 
## Call:
## lm(formula = diversity ~ weight + exercise, data = gut)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.99210 -0.92707  0.05392  0.86199  2.50362 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.6875     0.3191  14.689 4.31e-11 ***
## weight       -0.3606     0.5930  -0.608  0.55118    
## exercise      2.1073     0.5930   3.554  0.00244 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.427 on 17 degrees of freedom
## Multiple R-squared:  0.7621, Adjusted R-squared:  0.7341 
## F-statistic: 27.23 on 2 and 17 DF,  p-value: 5.004e-06

Adding exercise changed the contribution of weight significantly. Looks like exercise complicated things a bit.

Plot weight v. exercise to see what’s going on.

ggplot(
  data=gut,
  aes(
    x = weight,
    y = exercise
    )
  ) + 
  geom_point()

cor(gut$weight, gut$exercise)
##            [,1]
## [1,] -0.8337587

Looks like there’s a correlation between the 2 IVs. That explains what we’re seeing with the diminished effect of weight.

One option: just drop one of the variables. Another option: model the interaction

# pairs just shows a plot of each variable against each other. Sort of
# nice for quick visualization.
pairs(~age+exercise+weight, data=gut)

# pairs.panels (relies on psych package, can't handle ~ inputs) is also cool!
pairs.panels(gut[ ,3:5])

Homework 10a

Fit the gut OTU Shannon diversity model with an interaction (DV = diversity, IV = age, exercise, and their interaction).

model.age.ex = lm(diversity ~ age * exercise, data=gut)

Test the IVs. There is an interaction, so type III Anova() would probably be best so age * exercise doesn’t give different results than exercise * age… But good news, if ALL your IVs are continuous, you don’t have to worry about changing the contrasts statement. - type III Anova() and summary() give same p-values - anova() is still order-dependent unless perfectly balanced (very unlikely with continuous IVs)

Anova(model.age.ex, type=3)
## Anova Table (Type III tests)
## 
## Response: diversity
##              Sum Sq Df  F value    Pr(>F)    
## (Intercept)  439.37  1 228.8487 6.728e-11 ***
## age            0.00  1   0.0014    0.9706    
## exercise     109.56  1  57.0647 1.157e-06 ***
## age:exercise   4.62  1   2.4048    0.1405    
## Residuals     30.72 16                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(model.age.ex)
## 
## Call:
## lm(formula = diversity ~ age * exercise, data = gut)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.6555 -1.0236 -0.1549  0.9502  2.3121 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.78748    0.31647  15.128 6.73e-11 ***
## age           0.01216    0.32494   0.037    0.971    
## exercise      2.52488    0.33424   7.554 1.16e-06 ***
## age:exercise -0.53591    0.34559  -1.551    0.141    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.386 on 16 degrees of freedom
## Multiple R-squared:  0.7889, Adjusted R-squared:  0.7493 
## F-statistic: 19.93 on 3 and 16 DF,  p-value: 1.185e-05

I know the interaction is not significant, but pretend that it is. How would you describe, in words (not numbers) the OTU diversity relationship with age and exercise? Include your description in the text of your Rmd file.

Gut microbe diversity has a string, positive correlation with exercise. There is no statistically significant correlation between age and microbe diversity or between microbe diversity and the interaction between age and exercise.

It is hard to plot when you have >1 continuous IV. Try to come up with a way to do this.

gut2 = read.csv("gut_diversity.csv", header=TRUE)

fig <- plot_ly(
  type="scatter3d",
  mode="markers",
  gut2,
  x = ~weight, 
  y = ~age, 
  z = ~exercise,
  color = ~diversity
  )

fig